277 research outputs found

    The fossilized birth-death model for the analysis of stratigraphic range data under different speciation concepts

    Get PDF
    A birth-death-sampling model gives rise to phylogenetic trees with samples from the past and the present. Interpreting "birth" as branching speciation, "death" as extinction, and "sampling" as fossil preservation and recovery, this model -- also referred to as the fossilized birth-death (FBD) model -- gives rise to phylogenetic trees on extant and fossil samples. The model has been mathematically analyzed and successfully applied to a range of datasets on different taxonomic levels, such as penguins, plants, and insects. However, the current mathematical treatment of this model does not allow for a group of temporally distinct fossil specimens to be assigned to the same species. In this paper, we provide a general mathematical FBD modeling framework that explicitly takes "stratigraphic ranges" into account, with a stratigraphic range being defined as the lineage interval associated with a single species, ranging through time from the first to the last fossil appearance of the species. To assign a sequence of fossil samples in the phylogenetic tree to the same species, i.e., to specify a stratigraphic range, we need to define the mode of speciation. We provide expressions to account for three common speciation modes: budding (or asymmetric) speciation, bifurcating (or symmetric) speciation, and anagenetic speciation. Our equations allow for flexible joint Bayesian analysis of paleontological and neontological data. Furthermore, our framework is directly applicable to epidemiology, where a stratigraphic range is the observed duration of infection of a single patient, "birth" via budding is transmission, "death" is recovery, and "sampling" is sequencing the pathogen of a patient. Thus, we present a model that allows for incorporation of multiple observations through time from a single patient

    Bayesian total evidence dating reveals the recent crown radiation of penguins

    Get PDF
    The total-evidence approach to divergence-time dating uses molecular and morphological data from extant and fossil species to infer phylogenetic relationships, species divergence times, and macroevolutionary parameters in a single coherent framework. Current model-based implementations of this approach lack an appropriate model for the tree describing the diversification and fossilization process and can produce estimates that lead to erroneous conclusions. We address this shortcoming by providing a total-evidence method implemented in a Bayesian framework. This approach uses a mechanistic tree prior to describe the underlying diversification process that generated the tree of extant and fossil taxa. Previous attempts to apply the total-evidence approach have used tree priors that do not account for the possibility that fossil samples may be direct ancestors of other samples. The fossilized birth-death (FBD) process explicitly models the diversification, fossilization, and sampling processes and naturally allows for sampled ancestors. This model was recently applied to estimate divergence times based on molecular data and fossil occurrence dates. We incorporate the FBD model and a model of morphological trait evolution into a Bayesian total-evidence approach to dating species phylogenies. We apply this method to extant and fossil penguins and show that the modern penguins radiated much more recently than has been previously estimated, with the basal divergence in the crown clade occurring at ~12.7 Ma and most splits leading to extant species occurring in the last 2 million years. Our results demonstrate that including stem-fossil diversity can greatly improve the estimates of the divergence times of crown taxa. The method is available in BEAST2 (v. 2.4) www.beast2.org with packages SA (v. at least 1.1.4) and morph-models (v. at least 1.0.4).Comment: 50 pages, 6 figure

    Probabilistic Graphical Model Representation in Phylogenetics

    Get PDF
    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (1) reproducibility of an analysis, (2) model development and (3) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and non-specialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution

    RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language.

    Get PDF
    Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]

    Assessing the impact of incomplete species sampling on estimates of speciation and extinction rates

    Get PDF
    Estimating speciation and extinction rates is essential for understanding past and present biodiversity, but is challenging given the incompleteness of the rock and fossil records. Interest in this topic has led to a divergent suite of independent methods—paleontological estimates based on sampled stratigraphic ranges and phylogenetic estimates based on the observed branching times in a given phylogeny of living species. The fossilized birth–death (FBD) process is a model that explicitly recognizes that the branching events in a phylogenetic tree and sampled fossils were generated by the same underlying diversification process. A crucial advantage of this model is that it incorporates the possibility that some species may never be sampled. Here, we present an FBD model that estimates tree-wide diversification rates from stratigraphic range data when the underlying phylogeny of the fossil taxa may be unknown. The model can be applied when only occurrence data for taxonomically identified fossils are available, but still accounts for the incomplete phylogenetic structure of the data. We tested this new model using simulations and focused on how inferences are impacted by incomplete fossil recovery. We compared our approach with a phylogenetic model that does not incorporate incomplete species sampling and to three fossil-based alternatives for estimating diversification rates, including the widely implemented boundary-crosser and three-timer methods. The results of our simulations demonstrate that estimates under the FBD model are robust and more accurate than the alternative methods, particularly when fossil data are sparse, as the FBD model incorporates incomplete species sampling explicitly

    Inferring Processes of Coevolutionary Diversification in a Community of Panamanian Strangler Figs and Associated Pollinating Wasps

    Get PDF
    The fig and pollinator wasp obligate mutualism is diverse (~750 described species), ecologically important, and ancient (~80-90 Ma), providing model systems for generating and testing many questions in evolution and ecology. Once thought to be a prime example of strict one-to-one cospeciation, current thinking suggests that genera of pollinator wasps coevolve with corresponding subsections of figs, but the degree to which cospeciation or other processes contributes to the association at finer scales is unclear. Here we use genome-wide sequence data from a community of Panamanian strangler figs (Ficus subgenus Urostigma, section Americana) and associated fig wasp pollinators (Pegoscapus spp.) to infer the process of coevolutionary diversification in this obligate mutualism. Using a model-based approach adapted from the study of gene family evolution, our results indicate pervasive and ongoing host switching of pollinator wasps at this fine phylogenetic and regional scale. Although the model estimates a modest amount of cospeciation, simulations reveal this signal to be consistent with levels of co-association expected under a model of free host switching. Our findings provide an outline for testing how ecological and evolutionary processes can be modeled to evaluate the history of association of interacting lineages in a phylogenetic framework

    Taming the BEAST—A Community Teaching Material Resource for BEAST 2

    Get PDF
    Phylogenetics and phylodynamics are central topics in modern evolutionary biology. Phylogenetic methods reconstruct the evolutionary relationships among organisms, whereas phylodynamic approaches reveal the underlying diversification processes that lead to the observed relationships. These two fields have many practical applications in disciplines as diverse as epidemiology, developmental biology, palaeontology, ecology, and linguistics. The combination of increasingly large genetic data sets and increases in computing power is facilitating the development of more sophisticated phylogenetic and phylodynamic methods. Big data sets allow us to answer complex questions. However, since the required analyses are highly specific to the particular data set and question, a black-box method is not sufficient anymore. Instead, biologists are required to be actively involved with modeling decisions during data analysis. The modular design of the Bayesian phylogenetic software package BEAST 2 enables, and in fact enforces, this involvement. At the same time, the modular design enables computational biology groups to develop new methods at a rapid rate. A thorough understanding of the models and algorithms used by inference software is a critical prerequisite for successful hypothesis formulation and assessment. In particular, there is a need for more readily available resources aimed at helping interested scientists equip themselves with the skills to confidently use cutting-edge phylogenetic analysis software. These resources will also benefit researchers who do not have access to similar courses or training at their home institutions. Here, we introduce the “Taming the Beast” (https://taming-the-beast.github.io/) resource, which was developed as part of a workshop series bearing the same name, to facilitate the usage of the Bayesian phylogenetic software package BEAST 2
    corecore